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¢ Knowledge Graphs link key entities in a 
specific domain with other entities via 
relationships. 


¢ Researchers can then query these graphs 
to get probabilistic recommendations REPORTS 
and to infer new knowledge. 
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Knowledge Graphs for Earth Science? 


Why Research Community Needs Knowledge Graphs? 


¢ Untapped resource of . 
knowledge for a given domain is 
stored in papers and technical 
reports (unstructured). 


¢ Difficult to extract and infer 
knowledge at scale 
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Methodology to Build Knowledge Graphs 
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* Consists of two stages 
¢ Development of Heuristic algorithms to perform Semantic Entity Identification 
(Phenomena, Dataset, Instrument, Variable (Physical Property)...) to assist 
human experts in building training data [Steps 0-2] [Focus of this Poster] 
¢« Use Deep Learning Algorithms to improve results [Steps 3-7] 4 


Heuristic Algorithm Develooment Strategy 
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¢ Explore the use of existing 
taxonomies (GCMD, CF, SWEET) 


relevant datasets to papers 
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Extraction Results Variable TF-IDF 
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Instrument Term Frequency 
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Datasets are rarely mentioned 


verbatim in the papers. 
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Extraction Results 
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Dataset Name 
CLAMS MODIS L2 Aerosols 
(C1306210515-LARC_ASDC) 


ISLSCP II MODIS (Collection 4) Albedo, 2002 
(C179003149-ORNL_DAAC) 


ISLSCP Il MODIS (Collection 4) IGBP Land Cover, 
2000-2001 (C179002785-ORNL_DAAC) 
CER-NEWS_CCCM_Aqua-FM3-MODIS-CAL-CS_ [a 


RelB1 (CS769450-LARC_ASDC) 


LBA-ECO LC-39 MODIS Active Fire and 
Frequency Data for South America: 2000-2007 
(C179125645-ORNL_DAAC) 


MODISA_L2_ IOP (C1200034400-OB_DAAC) 


MODIS/Aqua Aerosol 5-Min L2 Swath 3km VO06 
(C2046S0560-LAADS) 


MODIS/Terra Aerosol S-Min L2 Swath 10km 
V006 (C203234517-LAADS) 


MODIS/Aqua Aerosol S-Min L2 Swath 10km 
V006 (C203234444-LAADS) 
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¢ Semantic entity identification is a difficult problem and heuristics 
based algorithms are brittle 


¢ Use of existing taxonomies is helpful for soecific entities 
(instruments/platforms) and less helpful for others (physical 
property/phenomena..) 


Quality of the taxonomy impacts extraction results 


CF is the least useful 
SWEET covers most concepts and has the best potential for use 


¢ Dataset profile approach is dependent on both the metadata and 
entity extraction quality 


Metadata creators view dataset keywords differently than dataset users 


Next Steps: Begin Machine Learning Phase 


¢ Use these algorithms to semi-automate training set generation 


Have Atmospheric Science students provide URLs to 5-10 papers from their 
research area 


Provide extractions and have students label results 
¢ Train Deep Neural Networks for entity extraction 
Evaluate results 


¢ Build verb extraction and categorization to identify relationships 
between different entity types 


